Phylogenetic trees are key to understanding evolutionary relationships. In R, the ape package (Analysis of Phylogenetics and Evolution) and the Bioconductor treeio package provide powerful tools to import, manipulate, and visualize phylogenetic trees.
These packages can be used to:
Load phylogenetic trees into R (from formats like Newick/Nexus) using ape (and treeio).
Create simple example trees
Perform common tree manipulations: rooting/unrooting, ladderizing, pruning (dropping tips), and basic traversal of tree structure.
Visualize trees using base ape plotting (via plot.phylo) and enhanced graphics with ggtree (built on treeio).
Annotate trees with metadata (e.g. traits, taxonomy, geography) and visualize these data on the tree (e.g. coloring tips by trait).
ape provides functions to read and write tree files, to store trees in R as objects of class phylo, and to manipulate and analyze these trees. With ape, you can perform tasks like computing distances, ancestral state reconstruction, and comparative analysis, but in this lesson we focus on tree handling and basic plotting. Ape’s base plotting functions (like plot.phylo) produce static phylograms or cladograms using base R graphics.
Functions to read and write trees
read.tree(): Reads a tree from a file in Newick - Newick format uses parentheses to represent the tree structure, with branch lengths and node labels. - Example: ((A:0.1,B:0.2):0.3,C:0.4);
ape provides a function plot.phylo() - it can be invoked by plot() function on phylo objects.
tr <-read.tree(text ="((Pan:5,Human:5):2,Gorilla:7);")par(mfrow=c(2,3))plot(tr, type ="cladogram", main ="cladogram") ; plot(tr, type ="unrooted", main="unrooted")plot(tr, type ="fan", main ="fan"); plot(tr, type ="radial", main ="radial")plot(tr, type ="tidy", main ="tidy")
# reroot the tree at a different noderooted_tree <-root(unrooted_tree, outgroup ="Species1", resolve.root =TRUE)plot(rooted_tree, main ="Rooted tree at Species1")
Ladderizing a tree
Ladderizing a tree means ordering the branches so that the tree has a tidy, ladder-like appearance (one side of each split has the smaller clade).
This doesn’t change the evolutionary relationships at all—it only affects the ordering of tips
Pruning and dropping tips
Often, you might want to prune a tree to remove certain tips (species) – for example, if you want to focus on a subset or if some tips have no data for your analysis.
You can use the drop.tip() function to remove tips from a tree.
# Create a random tree with 10 tipstr <-rtree(10)par(mfrow=c(1,2))plot(tr, main ="Original tree")# Drop tips "t1" and "t2"pruned_tree <-drop.tip(tr, c("t1", "t2"))plot(pruned_tree, main ="Pruned tree")
Extracting clades
Sometimes you want to isolate a subtree (clade) or inspect specific parts of a tree. Ape has tools to help with that:
getMRCA(tree, tips=c(...)): finds the most recent common ancestor (MRCA) node of a given set of tips. It returns the node number (an integer index used internally in the phylo object).
extract.clade(tree, node) : extracts the subtree/clade descended from a given internal node. You can get the node from getMRCA or if you happen to know an internal node number.
nodepath(tree, from=NULL, to=NULL): returns the sequence of node indices along the path between two nodes (by default, from root to each tip). This is useful for traversal or finding ancestor lineages.
Understanding node numbering
Tip nodes are numbered 1 to N (where N is number of tips)
Internal nodes are numbered (N+1) to (N + Nnode) where Nnode is number of internal nodes.
The tree$edge matrix has two columns: parent and child for each edge.
Internal nodes are typically referred to by number. For example, if tree$Nnode == 2 and Ntip == 3, then internal nodes are 4 and 5 (if tips are 1,2,3).
Getting MRCA
# Get the MRCA of tips 1 and 2mrca_node <-getMRCA(tr, c("1", "2"))print(mrca_node)
The simplest way to plot a tree in R is: plot(my_tree)
If my_tree is a phylo object, ape’s S3 method plot.phylo is called. By default, it plots a phylogram (branches with lengths) with the root at the bottom and tips at the top (cladogram style). Tips are labeled by their names, and branch lengths (if present) determine horizontal distances.
You can customize the appearance with many arguments:
type: "phylogram" (default) or "cladogram" (which ignores branch lengths and evenly spaces the tips horizontally), "fan" (radial tree), "unrooted" (unrooted layout), "radial" (similar to fan).
edge.color, edge.width, edge.lty: for branch line appearance.
tip.color, cex: color and size of tip labels.
use.edge.length: if FALSE, even a phylogram is drawn as cladogram (no length scaling).
main, sub: to add title or subtitle.
direction: "rightwards" (default), "leftwards", "upwards", "downwards" to orient the root.
Ape provides functions to add to an existing tree plot:
tiplabels() can add symbols or text next to tip labels (at the tips).
nodelabels() can add labels to internal nodes (e.g., bootstrap values or node IDs).
edgelabels() to label branches.
Adding annotations in base plots
For example, after plotting, nodelabels() with no arguments will put numbers at internal nodes. This can be useful to identify node IDs for use with extract.clade, etc., by visual inspection.
plot(bird.orders, cex=0.6, main="Bird Orders (phylogram)")nodelabels(frame ="circle", cex =0.5, col ="red")
Setting colors for tip labels
# Set colors for tip labelstip_colors <-ifelse(bird.orders$tip.label %in%c("Anseriformes", "Galliformes"), "red", "black")plot(bird.orders, cex=1, main="Bird Orders (phylogram)", tip.color=tip_colors)
Setting colors for tip labels
set color for clade common to Anseriformes and Struthioniformes to red
Since ggtree is ggplot-based, you can use typical ggplot theme elements. For instance:
theme_tree() is a preset minimal theme for tree plots (it’s usually applied by default).
You can add + xlim(...) or ylim(...) to adjust spacing.
Use geom_point2() or geom_nodepoint() to add symbols at nodes, geom_text2() to add text at nodes (with automatic detection of internal vs tip), etc. For example, geom_nodepoint(color="red", size=2) would put red dots at all internal nodes.
Highlighting or annotating clades
ggtree has special layers like geom_hilight(node=..., fill="yellow") to highlight a clade (shading the area under a clade), or geom_cladelabel(node=..., label="Clade X") to mark a clade with a text label. You must specify the node number of the clade’s root (which you can get via getMRCA or visually from a plotted tree with identify() or using viewClade() interactively). These are beyond the basics, but worth mentioning as possibilities.
For example, if we know node 30 in bird.orders is the MRCA of a certain group of birds (just as an example), we could do
Annotation Treed with Metadata (Traits, Geography, etc.)
In evolutionary studies, we often have additional data for each tip (or sometimes internal nodes) — for example:
Traits of species (body size, habitat, etc.)
Taxonomic info (like grouping species into families)
Geographical distribution or host information (for pathogen phylogenies, e.g., virus host species)
Temporal data (like sample collection year)
Statistical analysis results mapped onto branches or nodes (like dN/dS ratios on branches, or ancestral state probabilities at nodes)
The goal is to attach these data to the tree and visualize them effectively.
Combining Tree with Tip Data
The treeio package defines a data structure that can hold a tree plus associated data. However, we don’t need to manually construct that to make use of data in ggtree. Instead, ggtree provides a handy operator %<+% (think “attach data”) that allows us to attach a data frame of information to a tree plot
# Simulate a tree and trait datatree_example <-rtree(5, tip.label = LETTERS[1:5]) # random tree with tips A, B, C, D, E# Create a data frame of traits for each tiptrait_data <-data.frame(label = LETTERS[1:5],Status =c("Endangered", "Not Endangered", "Not Endangered", "Endangered", "Endangered"))trait_data
label Status
1 A Endangered
2 B Not Endangered
3 C Not Endangered
4 D Endangered
5 E Endangered
Attaching data to the tree
Here, ggtree(tree_example) %<+% trait_data attaches the data, and then geom_tiplab(aes(color=Status)) uses the Status column to color the tip labels.
Attaching data to the tree
Attaching data to the tree - continuose trait example
# Simulate a tree and continuous trait datatree_example <-rtree(20, tip.label = LETTERS[1:5])# Create a data frame of continuous traits for each tiptrait_data <-data.frame(label = LETTERS[1:20],TraitValue =rnorm(20, mean =5, sd =2))p <-ggtree(tree_example) %<+% trait_data +geom_tippoint(aes(color = TraitValue), size =6)p